Temporal Difference-based Adaptive policies in Neuro-dynamic Programming

نویسندگان

  • T. IKI
  • M. HORIGUCHI
  • M. YASUDA
  • M. KURANO
چکیده

Abstract. Based on temporal difference method in neuro-dynamic programming, an adaptive policy for finite state Markov decision processes with the average reward is constructed under the minorization condition. We estimate the value function by a learning iteration algorithm. And the adaptive policy is specified as an ε-forced modification of the greedy policy for the estimated value and the estimated transition probability matrix. Also, a numerical experiment for “Toymaker’s problem” is given to illustrate the validity of the adaptive policy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Quagent control via Passive and Active Learning

Artificial intelligence algorithms using passive and active learning versions of direct utility estimation, adaptive dynamic programming and temporal difference approaches to simulate an agent. The explored worlds consisted of discrete states (positions) bounded by internally generated “walls” that included one or more terminal states and a pre determined configuration of rewards for each state...

متن کامل

Design and Simulation of Adaptive Neuro Fuzzy Inference Based Controller for Chaotic Lorenz System

Chaos is a nonlinear behavior that shows chaotic and irregular responses to internal and external stimuli in dynamic systems. This behavior usually appears in systems that are highly sensitive to initial condition. In these systems, stabilization is a highly considerable tool for eliminating aberrant behaviors. In this paper, the problem of stabilization and tracking the chaos are investigated....

متن کامل

Dynamic Modeling of the Electromyographic and Masticatory Force Relation Through Adaptive Neuro-Fuzzy Inference System Principal Dynamic Mode Analysis

Introduction: Researchers have employed surface electromyography (EMG) to study the human masticatory system and the relationship between the activity of masticatory muscles and the mechanical features of mastication. This relationship has several applications in food texture analysis, control of prosthetic limbs, rehabilitation, and teleoperated robots. Materials and Methods: In this paper, w...

متن کامل

Control of Multivariable Systems Based on Emotional Temporal Difference Learning Controller

One of the most important issues that we face in controlling delayed systems and non-minimum phase systems is to fulfill objective orientations simultaneously and in the best way possible. In this paper proposing a new method, an objective orientation is presented for controlling multi-objective systems. The principles of this method is based an emotional temporal difference learning, and has a...

متن کامل

The CFD Provides Data for Adaptive Neuro-Fuzzy to Model the Heat Transfer in Flat and Discontinuous Fins

In the present study, Adaptive Neuro–Fuzzy Inference System (ANFIS) approach was applied for predicting the heat transfer and air flow pressure drop on flat and discontinuous fins. The heat transfer and friction characteristics were experimentally investigated in four flat and discontinuous fins with different geometric parameters including; fin length (r), fin interruption (s), fin pitch (p), ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007